Run Status

[23]:
requests.get(
        '{}projects/{}/runs/{}/status'.format(url, project_id, run_id),
        headers={"Authorization": credentials['result_token']}
    ).json()
[23]:
{'current_stage': {'description': 'compute similarity scores',
  'number': 2,
  'progress': {'absolute': 25000000,
   'description': 'number of already computed similarity scores',
   'relative': 1.0}},
 'stages': 3,
 'state': 'running',
 'time_added': '2019-04-30T12:18:44.633541+00:00',
 'time_started': '2019-04-30T12:18:44.778142+00:00'}

Now after some delay (depending on the size) we can fetch the results. This can of course be done by directly polling the REST API using requests, however for simplicity we will just use the watch_run_status function provided in clkhash.rest_client.

Note the server is provided rather than url.
[24]:
import clkhash.rest_client
for update in clkhash.rest_client.watch_run_status(server, project_id, run_id, credentials['result_token'], timeout=300):
    clear_output(wait=True)
    print(clkhash.rest_client.format_run_status(update))

State: completed
Stage (3/3): compute output
[25]:
data = json.loads(clkhash.rest_client.run_get_result_text(
    server,
    project_id,
    run_id,
    credentials['result_token']))

This result is the 1-1 mapping between rows that were more similar than the given threshold.

[30]:
for i in range(10):
    print("a[{}] maps to b[{}]".format(i, data['mapping'][str(i)]))
print("...")
a[0] maps to b[1449]
a[1] maps to b[2750]
a[2] maps to b[4656]
a[3] maps to b[4119]
a[4] maps to b[3306]
a[5] maps to b[2305]
a[6] maps to b[3944]
a[7] maps to b[992]
a[8] maps to b[4612]
a[9] maps to b[3629]
...

In this dataset there are 5000 records in common. With the chosen threshold and schema we currently retrieve:

[31]:
len(data['mapping'])
[31]:
4853

Cleanup

If you want you can delete the run and project from the anonlink-entity-service.

[44]:
requests.delete(
    "{}/projects/{}".format(url, project_id),
    headers={"Authorization": credentials['result_token']})
[44]:
<Response [403]>
[ ]: